ReLU-FCM trained by quasi-oppositional bare bone imperialist competition algorithm for predicting employment rate

Fuzzy cognitive maps (FCMs) are a powerful tool for system modeling, which can be used for static and dynamic analysis. However, traditional FCMs are usually learned by gradient-based methods, and the adopted sigmoid nonlinear activation function frequently causes gradient saturation. These two shortcomings set a limit on the modeling accuracy. To overcome those problems, we propose in this paper a new FCM with two improvements. First, the rectified linear unit (ReLu) activation function is adopted to replace the sigmoid function. Second, a newly proposed quasi-oppositional bare bone imperialist competition algorithm (QBBICA) is used to learn the FCM. The improved FCM is used to predict the employment rate of graduates from Liren College, Yanshan University. Experimental results show that the improved FCM is effective in employment rate prediction.


Introduction
Improving the employment rate and employment quality of university graduates has been an important task for China's education authority and universities. Based on the employment status of graduates, the education authority can adjust relevant policies and the university's faculty can adjust their education and training plans to enhance the graduates' employability. To guarantee sufficient employment of graduates, the responsible decision-makers need to make adjustments ahead of time. Adequate supporting data is necessary to ensure that the decisions are made in a scientific manner. Hence, accurately predicting the employment rate of graduates becomes an important topic of research.
Since the employment rate of graduates is affected by many factors, accurate prediction of graduates' employment rate remains a difficult task. In the past years, many researchers have paid much attention to the prediction of employment rate of graduates. In [1], Wang adopted a neural network with an adaptive learning back-propagation framework to construct an intelligent employment rate prediction model. In [2], Guo et al. developed a framework called Multi-major Employment Status to predict graduates' employment status when bias exists. In the framework, a long short term memory (LSTM) with dropout was adopted as the predictor. In [3], an employment rate prediction model was constructed using the back propagation (BP) of GA. Compared to GA, particle swarm optimization (PSO) has simpler evolutionary operations. Therefore, in [17], PSO was adopted to learn FCMs. The differential evolution (DE) [18], the asexual reproduction optimization (ARO) algorithm [19], and the imperialist competition algorithm (ICA) [20] have also been adopted to learn FCMs. ICA is an effective evolutionary algorithm first proposed by Atashpaz-Gargari and his colleagues [21]. It shows great potential in solving different complex optimization problems, such as flexible job shop scheduling [22], neural network training [23], parameter optimal design of heat exchangers [24], controller parameter tuning [25], and node placement in wireless sensor networks [26]. However, like many other evolutionary algorithms, ICA is also plagued by the premature problem. That is to say, if ICA finds a local optimal solution and fails to jump out of it, ICA will converge to the local optimum, which is undesired. To enhance the search ability of ICA, several ICA variants have been proposed. These variants can be roughly classified into three classes. The first class is to change the assimilation strategy. For example, in [27], an ICA variant called AR-ICA was proposed, where the assimilation coefficients are dynamically adjusted based on the concept of attraction and repulsion to better balance the diversity and convergence of the algorithm. In [28,29], a modified ICA was proposed, in which the colonies move to its imperialist using a velocity-position model like PSO algorithm. In [30], a barebone ICA (BB-ICA) was given. In BB-ICA, the new position of each colony is determined according to a Gaussian sampling. In [31], an ICA variant called Gbest-ICA was proposed via taking the difference between the colony's current position and the global best position into account when colony moves towards its imperialist. An orthogonal ICA (O-ICA) was proposed in [32], where a new move direction orthogonal to the original move direction is added to the colonies' movement formula. The other class of ICA variants is to modify the original ICA's competition and revolution step. In [33], the chaotic map was used to generate the revolted colonies, while in [34], the GA mutation operator is used to generate new revolted colonies to replace some original colonies. In [35], a clone evolution operator is introduced into the competition phase. Apart from the above two classes of variants, some researchers hybrid ICA with other evolutionary algorithms to improve the performance of ICA. For instance, in [36], the hybrids of ICA and simplex method was proposed. In the hybrid algorithm, the simplex method is responsible for a local searching. In [37], the ICA is hybridized with simulating annealing (SA), in which the SA is used to improve the position of imperialist after assimilation step. In [38], ICA is mixed with pattern search algorithm, the results of ICA severing as initial position are further improved using pattern search.
With the above facts in mind, we develop in this paper an improved FCM in which two improvements are made. First, a ReLU function is adopted to replace the original sigmoid function to enhance the nonlinear mapping ability. Second, an improved ICA, which integrates the quasi-opposition learning and bare bone ICA, is proposed to learn FCMs' weights. The proposed method is used to predict the employment rate of university graduates.
The rest of this paper is arranged as follows. Some basics about FCMs are provided in Section 2. The bare-bone imperialist competitive algorithm is detailed in Section 3. The proposed method for predicting the employment rate of graduates is presented in Section 4. The prediction results are shown in Section 4.4. Finally, the concluding remarks are given in Section 5.2.

ReLU-FCM
FCMs, as shown in Fig 1, can be viewed as a fuzzy version of Axelrod's cognitive maps. The causality between the nodes in an FCM can be a fuzzy degree. Like cognitive maps, an FCM consists of a set of nodes and weighted edges. The nodes represent the concepts or variables of a system while the weighted edges denote the causality between these concepts or variables.
Suppose there are n nodes in an FCM, denoted as {C 1 , C 2 , � � �, C n }. The weight of the edge from node C j to C i is denoted by w ji , where i, j = 1, 2, � � �, n. The value of w ji lies in [−1, 1]. The value of weight w ji explains the degree of the causality between the nodes, and the sign of weight w ji denotes the influential direction of the connected nodes. Specifically, if w ji > 0, it means that an increase of node C j results in an increase of node C i with strength w ji and vice versa. If w ji < 0, it means an increase of node C j results in a decrease of node C i with strength | w ji | and vice versa. If w ji = 0 means there is no relation between node C j and node C i , the weights of all edges in an FCM can be rewritten into a matrix form as follows: In the FCM, each node at time t has a value, called activation or state value, denoted by a i (t).
The state values of nodes vary with time. At time t + 1, the state value is calculated as: where f(�) is the transfer function or activation function. Usually, the sigmoid function can be expressed as: is selected as the transfer function. As stated before, the sigmoid function has several drawbacks that limit its nonlinear mapping performance. In this paper, the ReLU function is adopted to replace the sigmoid function. The ReLU function is defined as: Denote a(t) = [a 1 (t), a 2 (t), � � �, a n (t)] T . Then (2) can be written into the following form: In (5), the state value of each node at time t + 1 depends on the state values of all nodes at time t only. If the state value of each node at time t + 1 depends on the state values of all nodes at both time t and time t − 1, � � �, t − k + 1, i.e., then, the FCM is called a high-order FCM. Eq (6) can be written in a matrix form as follows: To enable the FCM to describe the complex dynamics of a system, we must determine its weights first. The weights can be determined in two ways, by using expert knowledge and by learning from actual data. In this paper, the weights of the FCM are obtained by learning. An improved ICA called BICA is developed to perform the learning.

Quasi-opposition learning
Opposition-based learning (OBL) was first introduced by Tizhoosh in 2005 [39] inspired by the Ying-Yang concept. OBL aims to find better solutions for machine learning tasks in an opposition direction. Since its introduction, OBL has been applied in several different fields such as neural network training and evolutionary algorithm construction [39]. OBL can be used to improve the performance of original methods. However, the opposition points frequently exceed the searching range and are therefore unable to produce higher performance. To mitigate this problem, in this paper, we focus on a variant of OBL, i.e., quasi-opposition based learning (QOBL). The related concepts are stated as follows.
Definition 1 (Opposite number) Let x 2 [a, b] be a real number. Its oppositex is defined as: where each element ofx is given as: Definition 3 (Quasi-opposite number) Let x 2 [a, b] be a real number. Its quasi-opposite number � x is defined as: where each element of � x is given as:

Bare bone ICA (BBICA)
BICA involves the same main steps as basic ICA, including initialization, assimilation, imperialistic competition, and convergence. These steps are described in detail in the following sections.

Initialization.
In this step, the main task is to generate initial countries, and then classify them into imperialists and colonies according to their power. Furthermore, the colonies are probably assigned to each imperialist to form empires. Suppose a country is represented by a vector x i = [x i1 , x i2 , � � �, x iD ] and each element x ij of vector x i lies in [l j , u j ], which is randomly generated as: where rand is a random number distributed in [0, 1]. To evaluate each country, the cost of each country is calculated as where Fð�Þ is called the cost function. Once N countries are generated, they will be classified into imperialists and colonies according to their power, i.e., the cost of each country. Usually, the first N imp countries are selected as the imperialists and the remaining N col = N − N imp countries are classified as colonies. Then, the colonies are assigned to each imperialist. The n-th imperialist occupies NC n colonies, which are calculated as: where p n ¼ c n À max i¼1;2;���;Nfc i g P N imp i¼1 ðc n À max i¼1;2;���;N fc i gÞ is the normalized power of an imperialist and c n is the cost of the n-th imperialist. The NC n colonies together with the imperialist make up the n-th empire.

Assimilation.
After initialization, the colonies in each empire move towards the corresponding imperialists to increase the total power of the empire. This process is called the assimilation step. In the original ICA, the colonies move towards their corresponding imperialists as shown in Fig 2. However, the assimilation is an important step, which is responsible for the exploitation and exploration of the ICA. A good strategy should strike a good balance between exploration and exploitation. To this end, a Gaussian sampling technique is adopted as the assimilation strategy in the BICA. Specifically, the new position of a colony is determined as: where x 0 i;j represents the j-th variable of x 0 i , i.e., the new position of the i-th colonies in an empire, and N(μ j , σ j ) is a Gaussian distribution random number with mean μ j and variance σ j . The mean and variance are calculated as: and where x imp j is the j-th variable of the imperialist position.

Imperialistic competition.
After assimilation, the power of each empire is enhanced. Then, the competition among empires begins. The competition leads more powerful empires to possess more colonies and the weaker ones to lose their colonies. The lost colonies from the weaker empires are reassigned to other more powerful empires via competition. Let the ℓ-th empire be the weakest. The probability of the n-th empire to possess the colonies is given as: where NTC n = TC n − max i {TC i }, i = 1, 2, � � �, N imp and TC i is the total cost of the i-th empire. Define a probability vector P as and a probability vector R as The difference vector between P and R is The empire whose relevant index is maximum in D will occupy the weakest colony of the weakest empire.

Convergence.
After empire competition, all the empires will collapse, except the most powerful one, and all the colonies will belong to this empire. As a consequence, all the colonies along with the unique imperialist will be located in the same position and have the same cost. In such a condition, the competition is stopped and the algorithm is terminated.

Quasi-oppositional bare-bone ICA (QBBICA)
To further improve the global searching performance of BBICA, in this paper, quasi-opposition learning is combined with BBICA. The resulting algorithm is called QBBICA. The difference between BBICA and QBBICA lies in the initialization and assimilation steps.

Quasi-oppositional initialization.
In QBBICA, after randomly generating N countries according to Eq (12), another N corresponding quasi-oppositional countries are generated according to the concept of quasi-oppositional point (11). Then, we calculate the fitness values of the 2N countries and sort them in an ascending order. The first N countries are selected as the initial population.

Quasi-oppositional assimilation.
In QBBICA, after all the colonies in an empire have moved towards their corresponding imperialists according to Eq (15), a set of quasioppositional colonies are generated according to Eq (11). Similarly, we calculate the fitness values of new colonies and their corresponding quasi-oppositional colonies, and sort the values in an ascending order. The first N colonies are selected the final colonies in the empire. For complex optimization problems with many local optimum, exploring the information of oppositional point of current location may provide more help to find global optimum. The reasons are as follows. First, the oppositional point maybe result in a better function value, and can speed up the convergence of ICA algorithm. Second, the oppositional direction of current location maybe provide a better descent direction and thus has a large probability to escape the local optimal. Therefore, the oppositional operator can greatly improve the performance of ICA algorithm.

Computation complexity of QBBICA.
QBBICA mainly consists of four steps, i.e., initialization, assimilation, revolution and imperialistic competition. For a D dimension problem and QBBICA with N countries and the maximum iteration number iter max , the computation complexity for generating N initial countries and evaluate its fitness is O (N � D + N � nF), where nF is a number of function evaluation. The computation complexity of generating N quasi-oppositional countries and fitness evaluation is O(N � D + N � nF). Therefore, in the initialization step, the computation complexity of QBBICA is 2O(N � D + N � nF). In the iterative process, the computation complexity of assimilation step is, similarly, 2O(iter max � (N � D + N � nF)). The computation complexity of revolution step is O(iter max � (rr � N + nF � rr � N) and that of imperialist competitive is O(iter max � (N imp + N imp � N col )), where rr is the revolution rate and N imp the number of imperialist, N col the number of colonies. Finally, the computation complexity of QBBICA is O(iter max � N � D � nF).

Predicting employment rate of graduates using proposed method
In this section, we describe in detail how our proposed method is used to predict the employment rate of graduates. Inspired by the work of Wojciech [40], we divide the prediction process into five steps. First, the historical employment rate series is transformed into a fuzzy time series (FTS). Second, the FTS is used to construct an FCM with the ReLU activation function. Third, the FCM is trained using QBBICA. Then, the trained FCM evolves into one or more steps to obtain a predicted fuzzy state. Finally, the predicted fuzzy state is defuzzified to obtain the numerical predicted values of employment rate.

Training of FCM using QBBICA
After the FCM is constructed, the weights of FCM should be determined through training. In the literature, many learning algorithms have been developed for FCM learning, such as Hebbian-based, population-based and hybrid algorithms. See [41] for more details. In this paper, the proposed QBBICA is used to learn the FCM. The main steps are illustrated as follows.
Step 1 Preparing employment rate data {y(t), t = 1, 2, � � �, n}. The data is normalized into [0, 1]. The first 60% is used as training samples and the remaining as test samples. The employment rate data is normalized and then transformed into m fuzzy sets with the triangle membership function. Consequently, the m FTS f i (t) is obtained.
Step 2 Constructing the FCM. According to the order k of the FCM and the node number m, an FCM with m nodes is constructed. The status value of FCM at time t is f 1 (t), f 2 (t), � � �, f m (t).
Step 3 Determining the algorithm parameters of QBBICA such as the number of countries, the number of empires, and the maximum number of iterations.
Step 4 Solution representation. If the FCM has m nodes, then each weight matrix is a m × m matrix. Furthermore, if the order is k, then there are k weight matrices. Therefore, a vector with length k × m × m is used to represent each country in QBBICA.
Step 5 Initialization. In the FCM, each weight lies in [−1, 1]. In this step, N vectors with length k × m × m in [−1, 1] are randomly generated. The r-th country is denoted as x r .
Step 6 Calculating the fitness of each country. To evaluate the power of each country, their fitness is calculated. First, each vector x r is reshaped into k matrices with size m × m, which is denoted as w j i . Then, the fitness of each country x r is defined as: whereâ is the estimated state and a i (t) is the target state.
Step 7 According the fitness values of the countries, the whole countries are classified into colonies and imperialists and assigned to their corresponding imperialists to form empires.
Step 8 Perform assimilation on each empire as described in subsection 3.2.2.
Step 9 Perform empire competition as described in subsection 3.2.3.
Step 10 Repeat Step 7 and Step 8 until the convergence condition is met.
Step 11 Output the best vector with the minimum fitness value. Reshaping the vector into weight matrices.

Predicting employment rate using FCM
After the FCM is trained, it is used to predict the employment rate. Suppose that the state value of the FCM at time t is {a 1 (t), a 2 (t), � � �, a m (t)}, where each a i (t) represents the degree of y (t) belonging to the i-th fuzzy set, it is a linguistic variable. After the weights of the FCM are determined, the FCM evolves into a step according to Eq (7) to obtain the predicted state values at t + 1, i.e.,â i ðt þ 1Þ. The linguistic valueâ i ðt þ 1Þ is transformed into a numerical value using the defuzzification technique. The predicted numerical valueŷðt þ 1Þ is obtained as: where mp i is the center of the i-th fuzzy set.

Performance of QBBICA
Here, the optimization performance of the proposed QBBICA is validated using several benchmark functions. The selected benchmark functions are listed in Table 1. In those six functions, the Sphere function is unimodal and the others are multimodal. To show the advantage of the proposed QBBICA, we compare it with four other algorithms, i.e., GA, PSO, ICA, and BBICA. The parameter settings of these algorithms are as follows. For GA, the "tournament" selection, the "polynomial mutation" and the "Simulated Binary Crossover (SBX)" are adopted. For PSO, c 1 = c 2 = 2, and the inertia weight decreases linearly from w max = 0.9 to w min = 0.4. For DE, the crossover rate CR = 0.8, mutation factor F = 0.8, and "DE / rand-to-best / 1 / bin" strategy is adopted. The parameters for ICA, BBICA, and the proposed QBBICA are set as in [21], which are set as follows, the assimilation coefficient γ = 2, revolution rate equals to 0.3, the number of initial imperialist is 8. For each algorithm, the population size is 50 and the maximum number of iterations is 1000. All the algorithms independently run 30 times. Table 2 shows the optimization results of benchmark functions with 10 dimension obtained by each algorithm, including the best, the worst and the mean fitness. It can be seen that our proposed QBBICA achieves better optimization results than the other four algorithms. To compare the convergence speeds of these algorithms, Fig 3 shows the fitness convergence curves of the algorithms. It is clear that our proposed QBBICA has a faster convergence speed than the other algorithms. To sum up, our proposed QBBICA not only has a stronger global  -32,32] f(x � ) = 0 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi search ability but also a faster convergence speed than the other algorithms. To compare the superiority of the proposed QBBICA, a Wilcoxon sign-rank-sum test had been conducted and the h values of the test are listed in Table 3. The h-value equaling to 1 indicates that QBBICA is better than other algorithm. From Table 3, one can see that the proposed QBBICA performs better than other algorithms except for PSO on sphere function. Furthermore, to demonstrate the computational efficiency, the average computation time of each algorithm during the 30 runs for each function are listed in Table 4.

Predicting employment rate
In this section, experiments are conducted to predict the employment rate of graduates from Liren College, Yanshan University. The employment rate is the ratio of the number of the employed graduates to the total number of graduates. The employment rate data of Liren College from 2005 to 2020 is used in the experiments, which is listed in Table 5.
The first 60% of the employment rate data is used as the training sample and the remaining 40% as the test sample.
In the experiments, five triangle membership functions are adopted to transform the employment rate into an FTS. Thus, there are five nodes in the FCM. The order of the FCM is set to 2 in this paper. Thus, there are two weight matrices and the size of each matrix is 5 × 5. For the purpose of comparison, the RCGA, PSO, ICA, BBICA and the proposed QBBICA are implemented and used to train the ReLU-FCM and the original FCM, denoted as Sig-FCM. For the sake of convenience, the FCMs trained by these algorithms are denoted as RCGA-FCM, PSO-FCM, ICA-FCM, BICA-FCM, and QBBICA-FCM, respectively. In these algorithms, each individual is represented by a vector with length 50. Each algorithm has 200 individuals, and the maximum number of iterations of each algorithm is 1000. In addition, to reveal its effectiveness in employment rate prediction, the FCM based prediction methods are also compared back propagation neural network, linear regression model and grey GM(1,1) model.
To objectively reflect the performance of the FCMs trained by the different algorithms, root mean square error (RMSE), coefficient of determination R 2 , coefficient of efficiency E sn , and mean absolute error (MAE) are used as the indices for performance evaluation. They are defined as follows: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where y i andŷ i are the real and predicted values, respectively, and N is the total number of data points used in the experiment. Coefficient of determination: where � y and � y are the mean values of the real and predicted sequences, respectively. The larger the R 2 is, the more variability is explained by the model. Coefficient of efficiency: The larger the coefficient of efficiency is, the closer the predicted value is to the real value. Mean absolute error: In the training phase, all the algorithms are run independently for 30 times. The employment rate data from 2005 to 2017 is used as the training set and that from 2018 to 2020 as the test set. The individual corresponding to the best fitness value in the 30 runs is taken as the final weight of the FCM. With the obtained weight, numerical prediction values are calculated using Eq (23). The prediction results from the training and test phases for ReLU-FCM and Sig-FCM are listed in Tables 6 and 7, and also, they are shown in Fig 4 for the sake of intuition. The performance indices are listed in Tables 8 and 9, respectively. It can be seen that the ReLU-FCM trained by QBBICA delivers better results than other FCMs in both prediction values and performance indices. The main reason for this is that QBBICA has a stronger search ability, which allows it to better balance exploration and exploitation. Therefore, QBBICA can find more appropriate weights of FCMs. Furthermore, the ReLU activation function has no saturation phenomenon, which enables ReLu-FCM to deliver better performance than Sig-FCM. On the other hand, to illustrate the computational efficiency of QBBICA and other referenced algorithms in FCM training process, the average cpu time in 30 runs is shown in Fig  5. Obviously, one can see that the average cpu time of expanded expended by QBBICA is less than ICA, BBICA, PSO and GA, but is competitive to DE. For a more straightforward comparison, the evolution curves of those algorithms are plotted in Fig 6. From Fig 6, it can be seen that QBBICA not only has a faster convergence speed but also a stronger search ability.
In order to demonstrate the advantage of FMC in prediction, the prediction results obtained by our proposed QBBICA-FCM is compared to those by BP neural network, linear   Tables 10 and 11. From Table 10, one can see that the prediction value obtained by our QBBICA-FCM is more accurate than those by BP neural network, LR model and GM(1,1) model. Furthermore, the four performance indices of QBBICA-FCM are also better than those of BP, LR and GM (1,1). This proves that our proposed QBBICA-FCM can achieve more accurate prediction results. From the above experimental results and comparisons, one can draw a conclusion that our proposed method can give more accurate prediction results than the other methods.

Conclusion
In this paper, an improved FCM is proposed to predict the employment rate of university graduates. Two improvements are made in order to get more accurate prediction results. One is that the sigmoid activation function is replaced with the ReLu function. The other is that an improved ICA algorithm, called quasi-oppositional learning bare-bone ICA (QBBICA), is proposed to train the FCM. The QBBICA algorithm has a stronger search ability and can obtain more appropriate weights of the FCM. Experiments are conducted to predict the employment